AITopics

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence (0.70)

Neural Information Processing SystemsFeb-8-2026, 06:37:52 GMT

39d0a8908fbe6c18039ea8227f827023-Supplemental.pdf

agent, participant, sketch, (15 more...)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence (0.70)

Christian Kroer, Tuomas Sandholm

A Unified Framework for Extensive-Form Game Abstraction with Bounds

Neural Information Processing SystemsNov-20-2025, 19:08:05 GMT

Despite this, abstraction remains poorly understood.

artificial intelligence, game theory, information, (17 more...)

Country:

North America > Canada > Alberta (0.14)
North America > United States > Texas (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Tan, Alvin Wei Ming, Prystawski, Ben, Boyce, Veronica, Frank, Michael C.

Context informs pragmatic interpretation in vision-language models

arXiv.org Artificial IntelligenceNov-7-2025

Iterated reference games - in which players repeatedly pick out novel referents using language - present a test case for agents' ability to perform context-sensitive pragmatic reasoning in multi-turn linguistic environments. We tested humans and vision-language models on trials from iterated reference games, varying the given context in terms of amount, order, and relevance. Without relevant context, models were above chance but substantially worse than humans. However, with relevant context, model performance increased dramatically over trials. Few-shot reference games with abstract referents remain a difficult task for machine learning models.

accuracy, large language model, machine learning, (22 more...)

2511.03908

Country:

Asia (0.93)
North America > United States (0.46)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.86)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceSep-18-2025

Efficient Last-Iterate Convergence in Regret Minimization via Adaptive Reward Transformation

Ren, Hang, Wu, Yulin, Qi, Shuhan, Zhang, Jiajia, Sun, Xiaozhen, Ma, Tianzi, Wang, Xuan

Regret minimization is a powerful method for finding Nash equilibria in Normal-Form Games (NFGs) and Extensive-Form Games (EFGs), but it typically guarantees convergence only for the average strategy. However, computing the average strategy requires significant computational resources or introduces additional errors, limiting its practical applicability. The Reward Transformation (RT) framework was introduced to regret minimization to achieve last-iterate convergence through reward function regularization. However, it faces practical challenges: its performance is highly sensitive to manually tuned parameters, which often deviate from theoretical convergence conditions, leading to slow convergence, oscillations, or stagnation in local optima. Inspired by previous work, we propose an adaptive technique to address these issues, ensuring better consistency between theoretical guarantees and practical performance for RT Regret Matching (RTRM), RT Counterfactual Regret Minimization (RTCFR), and their variants in solving NFGs and EFGs more effectively. Our adaptive methods dynamically adjust parameters, balancing exploration and exploitation while improving regret accumulation, ultimately enhancing asymptotic last-iterate convergence and achieving linear convergence. Experimental results demonstrate that our methods significantly accelerate convergence, outperforming state-of-the-art algorithms.

artificial intelligence, convergence, machine learning, (18 more...)

2509.13653

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-20-2024

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

Wang, Mingzhi, Ma, Chengdong, Chen, Qizhi, Meng, Linjian, Han, Yang, Xiao, Jiancong, Zhang, Zhaowei, Huo, Jing, Su, Weijie J., Yang, Yaodong

Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of Reinforcement Learning from Human Feedback (RLHF), self-play not only boosts Large Language Model (LLM) performance but also overcomes the limitations of traditional Bradley-Terry (BT) model assumptions by finding the Nash equilibrium (NE) of a preference-based, two-player constant-sum game. However, existing methods either guarantee only average-iterate convergence, incurring high storage and inference costs, or converge to the NE of a regularized game, failing to accurately reflect true human preferences. In this paper, we introduce Magnetic Preference Optimization (MPO), a novel approach capable of achieving last-iterate convergence to the NE of the original game, effectively overcoming the limitations of existing methods. Building upon Magnetic Mirror Descent (MMD), MPO attains a linear convergence rate, making it particularly suitable for fine-tuning LLMs. To ensure our algorithm is both theoretically sound and practically viable, we present a simple yet effective implementation that adapts the theoretical insights to the RLHF setting. Empirical results demonstrate that MPO can significantly enhance the performance of LLMs, highlighting the potential of self-play methods in alignment.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2410.16714

Country:

North America > United States > Virginia (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-7-2024, 23:21:10 GMT

Reviews: A Unified Framework for Extensive-Form Game Abstraction with Bounds

This paper advances a line of work exploring how to approximate the Nash equilibrium of a game that's too large to compute directly. The idea is to create a smaller abstraction of the game by combining information sets, solve for equilibrium in the smaller game, then map the solution back to the original game. The topic relates to NIPS since this is a state-of-the-art method to program game-playing AI agents like poker bots. The authors prove new bounds on the error of the approximation that are very general. The authors provide the first general proof that an e'-Nash equilibrium in an abstraction leads to an e-Nash equilibrium in the original game.

extensive-form game abstraction, information, reach probability, (10 more...)

Genre: Research Report > New Finding (0.37)

Technology: Information Technology > Artificial Intelligence > Games > Poker (0.56)

arXiv.org Artificial IntelligenceJul-31-2023

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

Sokota, Samuel, D'Orazio, Ryan, Ling, Chun Kai, Wu, David J., Kolter, J. Zico, Brown, Noam

In their seminal work, Nayyar et al. (2013) showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equilibria of the game with public policy announcements may not correspond to Nash equilibria of the original game. As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned non-correspondence problem -- thus, computing them can be treated as perfect-information problems. Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.

equilibrium, machine learning, reinforcement learning, (20 more...)

2301.09159

Country:

North America > Canada > Alberta (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)